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ABSTRACT 

The  problem  of  prior  and  delayed  commitment  in  zero  sum 
stochastic  differential  games  is  discussed.  A  new  formulation  and 
solution  based  on  the  delayed-commitment  model  is  derived  and  its 
significant  implications  to  stochastic  game  and  control  are  considered. 


1.  Introduction 


One  of  the  fundamental  tenets  of  game  theory  is  the  Normalization 
Principle  of  Von  Neumann  which  roughly  says  that  given  an  extensive 
game  one  can  always  reduce  it  to  an  equivalent  game  in  normal  form 
involving  only  strategies  and  payoffs  and  where  all  dynamic  and  in¬ 
formational  aspects  of  the  original  problem  have  been  suppressed  in 
the  form  of  strategies  by  considering  all  the  possible  actions  of  all  the 
players  under  all  possible  circumstances.  Asa  conceptual  simplifica¬ 
tion  this  device  is  extremely  useful.  In  fact  it  is  so  useful  that  one  can 
argue  that  it  has  disproportionately  influenced  the  development  of  game 
theory  in  the  past  two  decades  with  the  result  that  very  little  work  has 
been  done  on  the  extensive  form  of  games.  Recently,  Aumann  and 
Maschler  [l]  reexamined  the  normalization  principle  and  pointed  out 
persuasively  via  a  simple  counter  example  of  its  inappropriateness 
under  certain  conditions.  Their  results  have  immediate  and  serious 
consequences  in  stochastic  control  and  differential  game  problems  since 
both  are  special  cases  of  general  extensive  games.  In  this  paper  we 
shall; 

(i)  present  a  counter  example  in  the  same  spirit  as  that  of  (1]  but 
within  the  framework  of  a  zero  sum  stochastic  two  person  difference 
gome,  This  example  will  point  out  the  restricted  circumstances  under 
which  earlier  results  on  minim  ax  strategies  can  bo  considered  secure, 

(11)  point  out  that  (i)  is  actually  a  blessing  in  disguise  and  that  from 
our  new  viewpoint  we  con  actually  solve  the  minimax  problem  for  two 
person  zero  sum  Linoar-Quadratic-Caussian  stochastic  differential 
(difference)  games  much  more  effectively  than  before.  Finite  dimensional 
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minimax  solution  that  is  eminently  computable  will  be  presented. 

(iii)  Show  that  the  structure  of  the  well  known  optimal  stochastic 
control  law  (Kalman-Bucy  filter  in  cascade  with  a  zero  memory  linear 
map)  for  LQG  problem  is  in  fact  "optimal"  under  circumstances  which 
are  neither  gaussian  nor  linear.  This  explains  in  part  the  incredible 
robustness  of  the  LQG  result  in  practical  application  and  points  the  way 
to  efficient  solution  of  more  general  stochastic  control  problems. 

2.  The  Example 

The  notation  we  shall  use  in  this  section  are  as  follows:  we  write 
x  to  denote  the  fact  that  we  are  considering  it  as  a  random  variable, 
while  the  plain  x  indicates  a  particular  sample  of  x:  x  then  denotes  the 
expected  value  of  x;  in  particular,  x1  stands  for  the  unconditional  (prior) 
expectation  of  x  and  x",  the  conditional  (posterior  upon  information 
obtained  as  the  game  evolved)  expectation. 

Consider  the  scalar  two  stage  dynamic  systems 

Xj  =  x^  +  v  =  (Xj  +  u)  +  v  Xj  5  x  ~N(0,o)  (U 

where  u  and  v  are  the  controls  of  playors  1  and  II  respectively,  We  have 
the  performance  criterion 

7'  =  ^E[(x3)2  +  u2  -  2v2)  (2) 

which  I  attempts  to  minimize  and  II  maximize.  Player  I  is  given  the 
measurement 

%  =  x  +  w  ,  w~  N(0,  1)  (3) 

in  the  sense  to  be  explained  more  fully  in  section  t. 


-3- 


-4- 


u°  =  --^Efx/z)  =  --^x" 


Similarly  for 


u  =  v  (  z  )  = 


2(7  +  1) 


~  4  ~ 
z  =  a  z 


Max  j ,  _  Max  -  v2  +  (2az  +  2x )  v  +  2a'z  5?} 


y{4a  -  v  +  2a]  =>  v°  =  c°  =  0 


J  Mv°,  c°)  =  2a  +  a 


The  saddle  point  property  of  (  v  ,  c  )  is  thus  established.  Concomitant 
with  this  saddle  point  property,  it  is  often  asserted  or  implied  that  if 
player  I  chooses  the  strategy  v°  then  he  is  guaranteed  a  minimax  expected 
payoff  value  of  (10)  above.  This  statement  has  to  be  interpreted  with 
considerable  care  as  the  following  discussion  will  show.  Let  us  consider 
the  situation  facing  player  I  after  he  has  received  the  information  z  but 
before  anyone  has  acted.  Instead  of  (2)',  his  payoff  is  now  evaluated  by 

I"  =  -~E/z  {2u2  -  v2  +  2uv  +  2vx  +  2x  u }  (11) 

To  be  sure,  U_  player  II  uses  v°  “  c°  =  0,  then  the  optimal  act  for 
player  I  is  still  given  by  (8),  i,  e.  ,  u°  =  -■£><".  However,  this  action  docs 
not  guarantee  his  security  level  which  iB  obtained  by  solving 

T"(u*,  v*)  =  Ml£  Ma*  T"  (P-2) 

'  7  ueR  veR 

Note  that  in  (P-2),  :•  is  no  longer  t>  random  variable  but  a  given  number. 

To  solve  (P-2),  wo  shall  derive  u*  and  v*  us  a  saddle  point  pair  for  J  ", 
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For  the  purpose  of  solving  the  ZSTP  game  of  (P-2),  z  can  be  regarded 
as  part  of  the  common  prior  information  without  violating  the  restriction 
of  (5)  on  the  class  of  admissible  strategies  for  v.  For  fixed  u,  j'l'sr* 

v*  =  u  +  x"  (12) 

Substituting  (12)  into  (11)  and  J"=£> 

Min  i  E /  {2,u2  -  (u  +  x")2  +  2u(u  +x")  +  2x(u  +  x")  +  25u}=^.  (13) 

u*  =  x"  v*  =  u  +  x"  =  x"  (14) 

and 

2 

J"  (u’:',  V*)  =  -T  [x"]2  =  - -r - - - «r  z2  (15) 

6  (  0  +  1) 

Similarly  for  fixed  v*  =  -|x"  -  -j  — ^  |  z  ,  we  can  directly  verify  that 
2  — 

u*  =  --jx"  is  the  optimal  reply  and  yields  the  security  level  of  (15), 

On  the  other  hand,  the  strategy  v°  =  --|x"  against  v,!<  =  u  +  x"  produces 
a  payoff 

J"(Y°,  v*)  =i(x")2  >  ?"(u*,  v*)  =  -|(x”)2  (16) 

as  the  case  should  be.  The  inequality  of  (16)  is  disconcerting.  It  says 
that  for  all  possible  values  of  z,  the  strategy  u*  is  actually  a  safer 
strategy  than  y°.  Unless  I  has  reason  to  believe  that  II  has  irrevocably 
committed  himself  to  v°  =  c°  or  that  I  can  convince  II  that  he  has 
irrevocably  committed  himself  to  v°,  there  is  no  reason  at  nil  to  play  v° 
when  u*  is  safer  and  avnitnbt-\  The  reason  for  this  phenomenum,  ns 
pointed  out  by  Harsanyi  (2]  and  Aumnnn  and  Maschler  f  1  J,  is  the  problem 
of  prior  and  delayed  (posterior)  commitment.  Put  it  another  way,  after 
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the  information  is  received  we  really  have  a  nonzero  surn  game  facing 
the  two  players  with  (11)  the  payoff  for  I  and  (2)'  for  II.  The  strategy 
pair  (y°,  c°)  is  an  eo.uilibrium  pair  for  I  and  II  (in  the  Nash  sense). 
However,  it  is  well  known  that  equilibrium  strategies  do  not  in  general 
possess  any  minimax  or  guaranteed  value  properties  in  nonzero  sum 
games.  The  above  example  is  simply  one  illustration  of  this  fact.  If 
the  game  takes  place  at  a  very  fast  time  scale  such  that  human  reactions 
are  not  practical  and  mechanical  decision  is  necessary,  then  the  prior 
strategy  pair  (y°,  c°)  represents  a  reasonable  solution.  On  the  other 
hand,  in  many  socio-economic  multistage  games,  the  idea  of  a  purely 
mechanistic  decision  procedure  with  no  human  intervention  and  irrevocable 
commitment  to  a  strategy  is  rather  untenable  when  confronted  with  the 
kind  of  evidence  in  (16).  In  such  cases,  the  posterior  strategy  u*  seems 
much  more  preferable.  Of  course,  one  may  counter  with  the  argument 
that  since  both  the  prior  strategy  and  the  posterior  strategy  for  II  from 
II' 8  viewpoint  are  the  same,  v°  =  c  -  0,  we  should  expect  him  to  play 
it  hence  I  should  play  v.  This  reasoning  is  defective  on  two  accounts: 

(i)  I  is  dependent  on  II's  intelligence  (i,  e,  ,  II  is  clever  enough  to 
compute  both  the  prior  and  posterior  optimal  strategies)  for  his  payoff. 

But  what  if  II  is  dumb  but  lucky  to  play  v*? 

(ii)  Suppose  we  endow  II  with  the  measurement 

*y  -H  +1:  ,  ~~  N(0,  1),  "c,  w,  x  are  independent.  (17) 

then  in  general  II  will  not  have  the  same  prior  and  posterior  strategies. 

In  fact,  it  can  be  shown  that  from  the  viewpoint  of  player  I  , 
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~o  Of~.  a  (1  +  2a) 

u  =  v  (z)  - -s — 2'  "  '? 

2  (a  +  1)  +o 


z 


-o  „o/~. 

•'  =  B  (y)  = 


J  (3  +2) 


2(ct  +  1)  +  a 


t"  >' 


constitutes  a  saddle  point  for  and 


2g 

3(o  +  1) 


z 


=  u  +  E  /  [*x]  5  u  +  x" 

/  y .  z 


_ a _ 

0+1 


(18) 


(19) 


is  a  saddle  point  pair  for  .1".  Note  v°(z)  t  u*  and  J"(v°,  v*)>  J"(u*,  v*). 
Furthermore,  from  the  viewpoint  of  player  II,  he  faced  a  payoff 

J1"  =  E/y[7]*  J"  =E/zm  (20) 

Since  I  does  not  have  knowledge  of  J1",  there  is  no  compelling  reason 
to  assume  that  II  will  play  v°  unless  I  believes  in  prior  commitment.  In 
fact,  the  "optimal"  action  from  II' s  viewpoint  may  just  turn  out  to  be 
numerically  equal  to  v*.  In  other  words,  I  need  not  assume  II  is  malicious 
in  order  to  prepare  for  the  worst, 

3 ,  Some  Preliminaries  to  Stochastic  Differential  Games. 

At  first  glance,  the  result  of  section  I  seems  to  spell  disaster  for 
practically  all  previous  work  on  the  stochastic  (in  particular  Linear- 
Quadratic-Gaussian)  differential  game  problem.  The  "minimax"  or 
saddle  point  strategies  that  have  been  obtained  are  all  of  the  "prior" 
variety.  They  are  useful  or  reasonable  only  if  we  have  firm  belief  that 
our  opponent  has  made  irrevocable  prior  commitments,  before  the  game 
has  begun.  This  severely  limits  their  applicability  not  to  mention  the 
fact  that  in  general  these  strategies  can  only  be  realized  with  infinite- 
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dimensional  dynamic  systems  [3]  which  are  hardly  practical.  We  would 
like  to  show  in  below  sections  that  our  new  awareness  is  actually  a 
blessing  in  disguise  and  that  a  secure  "posterior"  strategy  can  be 
derived  for  both  players  that  is  both  simule  and  realizable  by  finite 
dimensional  linear  systems. 

Before  we  describe  the  problem  formulation  in  the  game  situation, 
lei  us  recalt  a  few  facts  for  the  one-player  linear-quadratic-gaussian 
stochastic  control  problem  which  we  shall  require  later.  * 

Consider  the  finite  dimensional  linear  stochastic  dynamic  system 
described  by  the  Ito  stochastic  differential  equation 

dx  -  Aft)  xoit  +  R(t)  udt  4  C(t)  dw(i)  x ( t ^ )  —  Nfs^,  P())  (1) 

da  -  H(t)xdt  +  F(t)  d?(t)  (2) 

where  A,  B,  C,  H,  F  are  known  nxn,  nxm,  nxr,  pxn,  pxq  matrices 
whoce  elements  are  continuous  on  (t^,  tj )  and  F  is  of  full  rank  with  q_>  p 
for  alt  t,  wft)  and  ?(t)  are  independent  standard  Wiener  processes.  We 
also  consider  the  payoff 

'S>  «  Bfl)  »  l  E  (x(tf)TSf  *ftf)  4  j  (uft) Tlt  U(t)  4  x(t)TMx(t))dt }  (3) 

*o 

where  0  ,  M(t)  >  0  ,  ft(t)>0  are  nsn,  nsn,  msm,  symmetric 

matrices  whose  elements  art  continuous  on  [t^,  t^J. 

First  vve  have  the  following  well  known  result. 

Result  1,  x(t)  and  ?.{t)  are  mensurable  separable  gnussian  random 


headers  well  versed  in  control  theory  or  engineering  can  skip  the  below 
•  ‘hnicat  specifications  and  go  directly  to  the  nest  section. 
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processes  with  values  in  Rn  and  R*3  respectively  and  each  having 
continuous  sample  paths  with  probability  one  [4,  pp.  135-136], 

Next  we  shall  define  the  class  of  admissible  control  laws,  T 
(strategies).  Let  I  denote  [to,  tf] ;  C[to,t]  the  space  of  continuous  functions 
on  (t^,  t]s  Z^,  the  minimal  o-algebra  generated  by  z^  =  eC[tQ,  t]  i.  e.  , 

Zt  =  a{Z(s),  se[tQl  t]  ]  . 

An  admissible  control  law  is  a  functional  vtlxCft^,  t]  -Rm 
such  that  v ( •  ,  z()  is  Lebesque  measurable  for  each  e  Cft^,  t) 
and  v ( t *  *  )  is  Z^-measurable  for  all  t  c  1,  Essentially  this  means 
that  the  control  u  at  t  can  only  depend  on  the  past  and  present  values 
uf  the  measurement  history  vs. ,  With  the  above  set  up  there  follows  the 
next  two  well  known  results. 

Result  2,  (Kalman- Rucv  Filtering)  (5]  The  conditional  mean  of  x(t) 
on  x(t)  *  E(x(t)/Z()  is  given  by 

elx  «  (A(t)x  *  B(t)u) dt  +  P{t)HT(FFT)‘ 1  (dr  -  H(t)xdt  (4) 

A  .«i 

x(t  )  «  x 
0  0 

wht?re  P(t)  satUfios  th«  DE 

P  a  AP  +  PAT  4  CCT  -  pHT(FFT)"‘  HP  ;  P(l  )  «  Po  (5) 

Corollary  f 5 ,  pp,  ?0-72]  If  in  addition  (A.H)  constitutes  an  observable 

pair,  i.  e. , 
t{ 

j  lT(t,  tf)H1(FFT)‘1 H*  (t,  tf) dt  >  0  Vtct^  (6) 

where  ;(t,  t)  is  the  fundamental  matrix  associated  with  A(t)  then  P(t) 
exists  and  is  bounded  for  ail  t  >  t  , 
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Result  3.  {The  Separation  Principle)  The  optimal  control  law  v  e  T  which 
minimizes  (3)  (5,  pp.  100-101]  subject  to  (1)  and  (2)  is  given  by 


U(t)  =  Y(t,  zt)  =  -  R'1BTS(t)x(t) 

(?) 

where 

s  =  -ATS  -  SA  -  M  +  SBR"1BTS  ; 

S(tf)  =  Sf 

(8) 

Corollary  [5,  pp.  98-99]  If  in  addition  (A,  B)  constitutes  a  controllable 
pair,  i.  e. 
t 

f  {<t,  t)  BR'1BT«T(t,  t)dt>  0  Vt  >  t  (9) 

Jt  ° 

o 

then  S(t)  exists  and  is  bounded  for  all  t  <  t^  . 

Operationally,  what  these  results  say  is  that  the  optimal  control  law  can 
be  realized  by  linear  combinations  {Eq,  (?)  )  of  the  state  (x(t) )  of  a  linear 
finite  dimensional  dynamic  systems  (Eq,  (4))  which  has  as  its  input  z(t). 
This  is  one  of  the  most  successful  and  widely  used  results  in  control 
theory. 

In  the  next  section  we  shall  be  using  results  2  and  3  extensively. 

In  order  to  avoid  cumbersome  notation,  we  shall  display  these  two  results 
graphically  to  highlight  their  significance.  This  is  done  in  Figure  I. 

The  optimal  controller  for  the  linear  dynamics  system  (block®)  is  an¬ 
other  linear  dynamic  system  of  the  same  dimension  (block  ©  )  followed 
by  a  static  linear  map  (block  ©  }.  Dotted  lines  indicate  major  parameter 
inputs  to  the  controller  which  are  pro-computed  via  Eqs.  (5)  and  (8), 

In  the  sequel,  wo  shall  only  utilize  results  2  and  3  in  the  form  of 
Figure  1  and  avoid  spelling  out  the  various  detail  parameter  matrices 
associated  with  each  block. 


S(t)  Eq.  (8) 


Optimal  stochastic  controller  for 
Eq,  (1)  which  minimir.es  (3). 

Figure  1,  Graphical  Representation  of  Results  l  and  3 


4,  A  New  Formulation  and  Solution  of  the  Einoar-Quadratic-Gaussian 
Stochastic  Differential  Games, 

In  the  LQG  games,  instead  of  Eq.  (3.  1)  we  have 

dx  -  (A(t)x  +  B(t)u  +  D(t)v)  dt  4  Cdw(t)  (1) 
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where  D  is  a  nxs  matrix  similarly  defined  for  the  control  input,  v, 
player  II.  I  and  II  are  endowed  with  measurements 

dz  =  Hxdt  +  Fd»(t)  (2a) 

dy  =  Gxdt  +  Kde  (t)  (2b) 

respectively 

where  (2b)  is  similarly  defined  as  (3.  2)  with  eft)  an  independent  Wiener 
process,  K  is  kxi  (i>  k)  and  of  full  rank. 

The  payoff  is  similarly  defined  with 

7'  = -j  E(x^(tj)  SjX(tj)  +  J  fu^Ru  -  vTQv  +  x^Mx]dt)  (3) 

*o 

T 

where  Q  >  0  for  all  t  e  I  and  the  addition  of  -v  Qv  term  is  due  to  the 
fact  that  v  is  maximizing.  The  strategy  class,  F  ,  for  u  is  same  as 
before  and  Fy  is  similarly  defined  for  v,  t.  c,  ,  @(t ,  ■)  iB  Yj-measurable 
for  all  t  and  (!(• ,  y{)  is  Lebesquc  measurable  for  each  y{  s  Cft^.tl, 

The  minima:!  strategy  pair  (Y°,  g°)  has  been  formally  obtained  earlier 
in  f 3),  They  arc  infinite  dimensional  in  the  sense  that  block  Q)  in 
Figure  2  for  each  player  can  only  be  realized  by  linear  dynamic  systems 
which  are  describable  by  partial  rather  than  ordinary  linear  differential 
equations. 

In  terms  of  our  discussions  in  section  2,  (Y0,  g°)  are  strategies 
of  the  prior  commitment  type.  After  the  game  has  started,  at  time  t 
and  from  the  viewpoint  of  player  I  the  payoff  now  Incomes 

7"eiEy*l3')  H) 

While  (Y°,  g°)  stilt  retain  their  equilibrium  properly,  they  no  longer 


are  secure  strategies.  The  question  then  arises  as  to  what  secure 
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strategy  can  player  I  adopt?  Note  that  in  (4),  for  fixed  Y,  B>  J"  is 
parameterized  by  the  observation  history  z^  s  Z^.  For  the  purpose 
of  computing  the  security  payoff  of  (4),  z^  is  merely  part  of  the  prior 
information.  It  is  reasonable  to  base  the  computation  on  the  knowledge 
of  z^  ,  i.  e.  ,  we  assume  the  admissible  strategy  class  of  B  to  include 
Z^-measurable  functions  in  addition  to  being  Y^-measurable.  This 
amounts  to  saying  that  in  calculating  his  control  we  shall  assume  that 
player  II  either  through  divine  guidance  or  a  perfect  spy  has  access  to 
player  I's  information.  We  submit  that  this  is  an  eminently  reasonable 
viewpoint  to  take  for  the  purpose  of  calculating  player  I's  security  payoff. 
To  be  sure,  we  may  endow  player  I  with  additional  information  pertain¬ 
ing  to  the  problem,  e.  g.  we  may  assume  that  II  also  knows  w(t)  or  ?  (t). 
However,  such  assumptions  arc  less  natural. 

Summarizing  then,  we  wish  to  find  Y*  s  Fu  ,  B*  e  Tu  *rv 


such  that 


J"  (y*.  $*) »  iJr  i'fpxr  (?") 

*  U  W  V  V 


(5) 


Our  overall  approach  to  the  solution  of  (5)  is  this.  We  shall  arbitrarily 
fix  V*  and  then  use  the  result  of  section  3  to  solve 


eopt)>  TV*.  e)  ■  VP  c  rv»  ru 


(5)' 


Let  gopt{v*)  be  the  optimal  controller  for  II  when  1  employs  the  fixed  v1*. 

Then  fix  B  .(v*)  and  use  the  result  of  section  3  again  to  solve 
opt 

T"(y  .,  B  .)  <  J"(v,  B  *>  c  r 

'  opt  *opt  -  opt  ’  u 


(5) 
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Let  v0p^(p  ^)  be  the  solution.  Consistency  then  requires  us  to  solve 
the  implicit  equation 


W*’))=Y* 


(6) 


Thus,  let  v*(.  ,  ■  )  be  a  particular  strategy  adopted  by  player  I. 

Let  Y*(‘  ,  •)  be  realized  by  an  n-dimensional  linear  dynamic  system  with 
state  s,  input  z,  and  output  u,  i.  e.  , 


ds  =  A*sdt  +  B^dz  ,  s(tQ)  =  *0  (7) 

u  =  C*s 


Then  Eqa.  (1),  (7),  (2a)  and  (2b)  appear  as  a  combined  linear  dynamic 
system  of  dimension  2n  (with  states  (x,  s))  to  player  II  through  the 
measurements  (2a)  and  (2b).  The  payoff  (5)  for  fixed  Y*  becomes 


xr  jE{(xTSfx)t^  +  jf  (xTMx+  sTC«'TRC'«'s-vTQv)dt/Zt,Yt} 

(8) 

This  is  a  standard  LQG  control  problem  to  which  results  2  and  3  of 
section  3  apply  directly.  The  optimal  controller  (3^(1,  z^,  y^)  is  given 
as  in  Figure  2. 

The  combined  linear  dynamic  system  is  indicated  by  the  block  0  1 
enclosed  in  dotted  line,  This  plays  the  same  role  as  block®  in  Figure  1, 

The  optimal  controller,  as  in  Figure  1,  consists  of  blocks  (2)’  and  (?)  \ 

The  filtering  part,  block  computes  the  estimate  x  and  '2,  It  does 
this  by  reproducing  s(t)  and  u(t)  exactly  since  both  s(t)  and  u(t)  are 
Zj-measurable  (Hence  s(t)  =  E(s(t)/Z()  s  s(t)  .  u(t)  -  E(u(t)/Zj)  -  u(t)). 

The  conditional  mean  x(t)  =  E(x(t)/Z  ,  Y()  is  computed  via  an 
n-dimensional  linear  system  via  rc suit  2  (Kalman  -  Buoy  filter  0-). 


Figuro  2  Optimal  Controller  0 


opt 


=  0* 


Block  (D'  is  a  atatlc  linear  map  of  x  and  $  to  v  similar  to  (D  in  Figure  1, 
i.  o. 

v(t)  -  Sj(t)x(t)  +  S2(t)8(t)  (9) 

Now  suppose  II  fixed  his  strategy  at  0  j(Y*)  =  0*  as  determined 

above,  wo  shall  show  that  Eq,  (6)  precisely  defines  the  optimal  strategy 
for  y  Thus  y*,  0*  constitutes  a  saddle  point  pair  to  (5)  and 


consequently  solves  the  problem.  To  see  this,  let  us  consider  the 
combined  dynamic  system  Eqs.(l)  (2)  and  blocks  (?)'  as  appeared 

to  I,  They  constitute  a  3n-dimensional  linear  dynamic  systems 
(with  states  (x,x,s)):  2n  from  Eq.(l)  and  blocks  (Ql  and  Q)'-,  n  from 
block  Q  .  Furthermore,  using  (9)  the  payoff  (4)  becomes 


Thus  I  sees  for  fixed  p*  a  standard  LQC-  problem  with  3n  state  variables 
to  which  results  2  and  3  again  apply, 

Wc  have 

u(t)  »  K,(t)E(x/2;t)  +  K2(t)E(x/Z()  +  K3(t)E(s//Ct)  ( J 1) 

However,  since  all  outputs  of  block  (7)  aro  2^-mensurable  by  construction, 
they  are  deterministic  as  far  as  1  is  concerned.  In  fact,  by  definition 
and  the  requirement  of  Eq,  (f>)  they  are  also  outputs  of  v*  that  we  are  in 
the  process  of  determination.  Tims  they  need  not  be  estimated  or 
computed.  The  states  of  (1)'  and  (2jJ  ',  {,  c. ,  x  and  x  can  be  estimated 
via  result  2,  i,  e„  we  have 

xc  4  E(x/Zt)  ,  xe  4  E(x(t)/Zt)  4  E(E(x(t)/Zt,  Yt)/Z()  =  E(x/Z<t)  i  xc! 
which  aro  computal>le  via  a  block  (2)"  by  regarding  (T)',  (31$  and  (j)1 
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as  a  new  block  ^)".  The  3tates  of  an  n-dimensional  linear  dynamic 

system,  is  xg  which  by  construction  and  definition  is  precisely  the  state  s 
of  the  block  @  and  is  the  conditional  mean  of  both  x  and  x  given  Z  , 
Consequently  from  result  3,  v/e  conclude  that  the  optimal  control  u  can 
be  produced  using  a  linear  combination  of  x^  only  in  a  block  (T)"  i,  e. 

Eq.  (11)  becomes  u(t)  =  (K j(t)  +  ^(t)  +  K t)  ]^e( t) .  This  is  shown  in 
Figure  3  which  is  simply  a  rearrangement  of  Figure  2. 


Figure  i.  Optimal  Controller  Y* 

Finally,  it  is  worthwhile  to  clarify  the  meaning  of  the  strategy  1  *  89  com¬ 
pared  to  other  strategies.  Let  (|0,  8°)  he  the  minimax  strategy  pair 
determined  according  to  f 3 1  (the  prior  commitment  model).  At  time  t  =  t^, 
if  I  has  to  make  a  commitment  to  a  strategy  for  playing  the  rest  of  the 
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game  ,  Y°  certainly  represents  a  reasonable  choice  (similarly  for  8°) 
since 

jtY*.  P*)>  J(Y*.  8°)>  J'(Y°,  8°)  (12)* 

On  the  other  hand,  as  soon  as  the  game  has  progressed  for  some  time, 
we  have  at  t  >  t 


Max 

K*rv 


j"(y°,  g)>  j"(y°,  e*>  >  j"<y*,  8*) 


(13) 


From  the  vantage  point  of  I  at  t,  r°  becomes  a  rather  unsafe  strategy 
for  the  rest  of  the  game  compared  to  Y*.  To  be  sure,  we  still  have 
T'(y*.  8*)  >  J"(y°,  @°).  But  there  is  no  compelling  reason  to  believe  that 
II  will  definitely  play  g°  as  explained  in  section  2.  Conceptually,  at 
t>  tQ,  we  use  (v*,  (3  #)  for  the  purpose  of  determining  u(t)  only.  At  P>  t, 
we  havo  a  different  J"  based  on  new  information  and  a  different  minimax 
game  to  solve.  A  different  (y*,  fP)  will  be  used  to  determine  u(t’).  In 
general,  this  would  require  the  solution  of  a  TPZSG  for  each  t.  However, 
in  the  LQG  game  being  discussed  here,  a  great  practical  simplification 
occurs  due  to  the  fact  that  the  parameters  of  V*.  8*,  i.  e.  ,  Sj,  S2  in  Eq.  (9) 
Kj,  1<2,  K3,  in  Eq,  (11)  (see  also  Eqs,(5.  9)  (5.  11)  (5,  15)  (5.  17)  next 
section)  are  completely  independent  of*  ^  and  y  Consequently,  they 
can  In  fact  be  computed  beforehand.  In  other  words,  the  different 
(Y*.  S*)  pair  I  determines  for  each  t_>  t  are  in  fact  independent  of  the 
actual  k  Note,  however,  this  does  not  mean  that  we  advocate  I  should 
commit  himself  to  t*  beforehand.  Conceptually,  he  uses  y°  at  I  to 


Note  this  is  different  from  deciding  what  value  to  use  for  u(t  ),  v(t  ), 

In  fact  (Y°,  8°)  and  (v * ,  fp)  will  produce  the  same  u(t  )  sine# 
v  _  o 


-19- 


compute  u(t)  only.  He  then  re-solves  for  Y*  at  each  different  t  and  uses 
the  new  (but  identical)  V*  to  compute  the  new  u(t).  In  practice,  what  this 
means  is  that  he  must  have  secrecy  if  he  decides  (i.  e.  ,  commits 
himself)  to  adopt  the  posterior  strategy  Y*.  He  should  convince  his 
opponent  that  his  decisions  are  made  as  the  need  arises  and  that  all 
his  options  are  open  at  all  times.  If  no  secrecy  is  possible  and  he  muse 
announce  his  strategy  beforehand  then  V°  should  be  his  choice. 

Note  that  under  the  fictitious  saddle  point  condition  when  (Y*.  fp) 
are  employed,  the  block©  and  Y*  are  identical  as  well  as  the  outputs 
s,  s  and  u,  u,  Of  course,  if  we  choose  to  use  a  different  0  4  0*  by 
say,  using  a'  t  Y*,  in  such  a  case  u  ^  u  and  s  i s,  and  x,  xg  can  no 
longer  be  interpreted  as  conditional  means.  However,  7"{Y*,  P*)>7"(T*.  0) 
in  this  case  by  the  derivation  just  given.  Consequently,  the  minimax 
security  level  of  (5)  is  achieved  when  we  render  such  that  the  Y  block 
is  identical  to  Mock  in  Figure  3,  In  other  words,  under  the  con¬ 
ditions  stated,  the  worst  that  II  can  do  to  I  is  to  use  the  strategy  (3*. 
and  the  1.  st  counter  strategy  is  Y*  with  T"( Y*.  0*)  the  security  level 
at  time  t.  Of  course,  in  real  life  when  II  does  not  have  available 
both  the  Information  *(t)  and  y(t),  I  can  probably  expect  better  returns 
than  J"(y\  0$), 

5.  Existence  Questions  and  a  Simple  Example 

So  far  we  have  not  addressed  ourselves  to  the  question  of  existence 

% - - - 

fictitious  in  the  sense  that  this  game  is  solved  only  for  the  purpose  of 
computing  I’s  security  payoff. 
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of  the  solution  which  was  derived  in  the  previous  section.  Since  the 
solution  is  obtained  by  solving  a  pair  of  coupled  stochastic  control 
problems  (Eqs.  (4.  5'),  (4  5")  and  4.6)),  the  existence  question  is 
directly  dependent  on  the  existence  of  solutions  of  a  set  of  coupled 
Riccati  equations  associated  with  the  control  problems.  The  explicit 
form  of  these  Riccati  equations  while  straightforward  to  write  down 
is  rather  cumbersome  notationally  in  the  general  case.  Nor  is  it 
possible  to  state  simple  and  meaningful  sufficient  conditions  to  guarantee 
the  existence  of  the  solutions  to  these  DEs,  What  we  propose  to  do  in 
this  section  is  to  carry  out  the  derivation  of  the  explicit  solution  for  a 
very  simple  problem  to  show  the  various  equations  involved.  The 
procedure  is  completely  similar  in  the  general  cose. 


Let  the  scalar  dynamic  system  and  observations  be 

X  s  U  +  V 

N(t  )  ■ 

U 

-  N(x  .  p  ) 

O  1  0 

(1) 

dr.  »  x'dt  + 

are  statistically  independent 

(2) 

f,  t 

standard  wiener  processes 

dy  ^  xrft  +  dr 

with  r.ero  mean  and 

(3) 

variance  I  -  t 

0 

and  payoff 

y  *  ]  *2<  v 4  \ 

■<  2 

1  in  -  2v 

t 

o 

Sdt 

(•1) 

Let  v*  be  given  by 

ds  *  asdt  +  bdr 

(>) 

U  '  iS 

(*») 

where  a,  it,  and  <  are 

parameters 

to  lie  determined.  From  I's 

viewpoint 

of  a  secure  strategy, 

If  maximum 

s  at  t  *»  t  . 

—  o 
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t. 


EfJ'/Z^  Yt]  -  Ef-^x  (tf)  +-  j  (c  s  -  2v2)dt/Zt,  Yt]  (7) 

subject  to  (1),  (5),  and  (6).  Using  standard  LQG  results,  we  get  for 


all  t  >  t 
—  o 


dx  =  (c§  +  v)dt  +  p(dz  +  dy  -  2xdt) 
ds  =  as  dt  +  b  dz 


*(t  )  =  *o 

O  O 

s(t  )  =  X 
o  o 


(8) 


where 


p  =  -  2p 


and  the  control 


P(*0)  =  P0 


v  =  "2^1  |(t)*  +  Sl2(t)s 


(9) 


(io; 


where 


S(t)4 


SU  S12 
^12  ^22 


-Eil  3- 


(ID 


s(tf)  = 


I  0 

0  0 


Eqs.  (8-11)  define  SOJj((V’;i),  Now  from  the  viewpoint  of  I,  f)  and  (1) 
define  a  3n-dimensional  linear  dynamic  system 

dx  =  (-j  SjjX  +-j  f3j 2 8  +  u)dt  x(t( 


ds  -  asdt  +  bdt 
with  a  payoff  at  time  t  >  t 


N(x  ,  P 

) 

0  0 

+  p  dt 

A  A 

x(to)  >  x2 

A  A 

St  -  X 

O  0 

(12) 
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to  be  minimized.  Once  again  using  standard  results,  we  first  compute  the 
conditional  mean  of  x(t)  and  x(t),  as  xe(t)  and  xe(t).  Note  that  since  s  is 
Z^- measurable  we  have 

dse  "  4  Su^  +ISl2®  +  u^dt  +S,j(t)(dz  -  x^dt)  xe<<0^a^0  (14a) 


dx^  =  (pxft  +  S(1  -  2p)xt  -jSjjS +u)dt +pdz +Vj2(t)(dz  -  x^dt)  (14b) 


where 


r  2 

Si 

Vll 

r  - 
j  0  0 

V  ,v  » 

r  2 

’  1  2 
0  P 

s  ret  i  = 

'  0 

o  o_ 

Ml  12 

12 

l_v  J 

Now  setting  bv  definition  s  -  s  *  and  noting  the  easily  checked  identity 
£,|(S)  i  +  l\  5't2(t)  £  ^U)  •  Wl’  ra"  vefifV  that 

x  (t)  ix  ft)  iE(x(t) ,'*/.  ) 

C  G  l 


and  wc  have  finally 
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b*Eu  (19) 

a  =-j  (Sjj  +  S^)  -  b  +  c  (20) 

If  we  substitute  Eqs,  ( 18-20)  into  Eqs.  (9),  (11),  (15'),  and(17),  they 
form  a  set  of  coupled  nonlinear  differential  equations  of  the  Riccati  type. 
Their  solution  completely  specifies  the  secure  strategy  v*  via  Eqs.  (5), 
(6),  (18-20).  Consequently,  the  existence  of  v*  is  equivalent  to  the 
existence  of  solutions  of  Eqs.  (9),  (11),  (15')  and  (17). 

6.  Practical  Implications,  Open  Problems,  and  Conclusions, 

There  are  several  implications  of  the  results  obtained  that  are 
worth  further  discussion. 

First  of  all,  it  should  be  understood  that  the  strategy  we  derived 
for  u  i.  e,  ,  v*,  is  secure  only  with  respect  to  a  set  of  assumptions 
which  we  assert  to  he  reasonable,  Roughly  speaking,  we  allow  our 
opponent  to  know  everything  that  we  may  know.  This  seems  to  be  as 
pessimistic  an  assumption  as  one  would  like  to  use,  It  appears  paranoid 
to  assume  that  the  other  player  can  have  access  to  knowledge  concerning 
the  choices  of  Nature,  i.  e,  ,  values  of  ?(t),  f.(t),  w(t)  etc,  ,  beyond  the 
probabilistic  knowledge  that  are  already  permitted  in  the  statement  of 
the  original  problem.  Our  assumption  is  also  in  line  with  oilier  approach! 
to  the  control  of  uncertain  system  | <».  7  J  .  They  have  taken  the 
viewpoint  that  such  problems  may  toe  regarded  as  a  game  against  an 
opponent  (Nature)  where  the  upper  value  of  the  game  is  sought.  In  other 
words,  the  opponent  (Nature  or  um  ertaintly)  makes  tin*  moves  knowing 
everything  you  have  known  and/or  have  done. 

In  this  respeet.  the  derived  solution  has  an  additional  appealing 
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feature,  Consider  the  linear  stochastic  dynamic  systems 


dx  =  (Ax  +  Bu)dt  +  Cdw  +vdt 

(1) 

dz  =  Hxdt  +  d£ 

(2) 

where  v(t)  represents  terms  which  arise  due  to  approximations  and 
inaccuracies  in  modelling  of  the  real  (probably  nonlinear  and  nongaussian) 
system.  Now  if  we  consider  a  payoff 


(uaRu)dt]  (3) 

o 


and  a  aizo-of-approximation  constraints 


X 


(4) 


then  the  results  in  section  4  state  that  a  "good"  control  law  in  this 
situation  is  an  n-dimensionul  linear  dynamic  system  followed  by  a  zero 
memory  linear  map.  This  explains  the  almost  unbelievable  robustness 
of  the  structure  of  the  well  known  optimal  control  law  ((Insults  l  and  3) 
in  widely  diverse  applications  where  linearity  or  gaussianneas  Isas 
been  clearly  violated.  In  other  words,  except  for  parameter  values, 
the  linear  structure  of  vft  remains  appropriate  (i.  e. .  safe)  in  highly 
nonlinear  ahd  poorly  defined  situations.  In  fact,  the  above  discussion 
implies  that  "optimal"  stochastic  control  of  nonlinear  system  can  now 
be  attempted  by  finite  dimensional  optimization  on  the  parameters  of 
vc,  The  engineering  significance  c>f  this  cannot  be  overstated. 


The  recognition  that  in  the  delayed  commitment  mode  alt  stochastic 
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game  in  extensive  form  are  nonzero  sum  raises  interesting  problems  as 
well  as  possibilities,  In  this  report  we  have  only  explored  two  solution 
concepts  associated  with  NZS  games,  namely,  Nash  equilibrium  and 
individual  minimax  solutions.  There  are  many  other  solution  concepts 
involving  bargaining,  coalitions,  etc.  For  example,  we  can  visualize 
that  the  two  players  may  wish  to  enter  into  information  exchange  during 
the  play  of  the  game. 
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